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Abstract 

We develop a mixture procedure for monitoring parallel streams of data for a change-point 
that affects only a subset of them, without assuming a spatial structure relating the data 
streams to one another. Observations are assumed initially to be independent standard nor- 
mal random variables. After a changepoint the observations in a subset of the streams of 
data have non-zero mean values. The subset and the post-change means are unknown. The 
procedure we study uses stream specific generalized likelihood ratio statistics, which are com- 
bined to form an overall detection statistic in a mixture model that hypothesizes an assumed 
fraction po of affected data streams. An analytic expression is obtained for the average run 
length (ARL) when there is no change and is shown by simulations to be very accurate. Simi- 
larly, an approximation for the expected detection delay after a change-point is also obtained. 
Numerical examples are given to compare the suggested procedure to other procedures for 
unstructured problems and in one case where the problem is assumed to have a well defined 
geometric structure. Finally we discuss sensitivity of the procedure to the asssumed value of 
Po and suggest a generalization. 



1 Introduction 

Single sequence problems of change-point detection have a long history in industrial quality control, 
where an observed process is assumed initially to be in control and at a change-point becomes out 
of control. It is desired to detect the change-point with as little delay as possible subject to the 
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constraint that false detections occurring before the true change-point are very rare. Outstanding 
early contributions are due to Page [Pag54], [Pag55], Shiryaev [Shi63], and Lorden [Lor71]. 

We assume there are parallel streams of data subject to change-points. More precisely suppose 
that for each n — 1, . . . , N, we make observations y n> t, t = 1,2,.... The observations are mutually 
independent within and across data streams. At a certain time k, there are changes in the distri- 
butions of observations made at a subset M C {1, . . . , N} of cardinality \J\f\ < N. Also denote by 
M c the set of unaffected data streams. The change-point k, the subset M and its size, and the size 
of the changes are all unknown. As in the case of a single sequence N = 1, the goal is to detect the 
change-point as soon as possible after it occurs, while keeping the frequency of false alarms as low 
as possible. In the change-point detection literature, a surrogate for the frequency of false alarms 
is the average- run-length (ARL), defined to be the expected time before incorrectly announcing a 
change of distribution when none has occurred. 

It may be convenient to imagine that the data streams represent observations made by a col- 
lection of N sensors, that the change-point is the onset of a localized signal that can be detected 
by sensors in the neighborhood of the signal. However, the problem is unstructured in the sense 
that there is no model that relates the changes seen at the different sensors. In a structured prob- 
lem there exists a profile determining the relative magnitudes of the changes observed by different 
sensors, say according to their distance from the location of a signal, cf. [SY08]. For discussions of 
unstructured problems and applications see [TV08], [McilO], [CGV+10], [PRT03], and [LLR09]. 

The detection problem of particular interest in this paper involves the case that N is large and 
|JV| is relatively small. To achieve efficient detection, the detection procedure should use insofar 
as possible only useful information from affected sensors and suppress noise from the unaffected 
sensors. 

In analogy with the well known CUSUM statistic (e.g., Page [Pag54], [Pag55], Lorden [Lor71]), 
Mei [McilO] recently proposed a multisensor procedure based on sums of the CUSUM statistic from 
individual sensors. He then compares the sum with a suitable threshold to determine a stopping rule. 
While the distributions of the data, both before and after the change-point, are completely general, 
they are also assumed to be completely known. The method is shown to minimize asymptotically 
the expected detection delay for a given false alarm rate, when the threshold value (and hence the 
constraint imposed by the ARL) becomes infinitely large. The procedure fails to be asymptotically 
optimal when the specified distributions are incorrect. Tartakovsky and Veeravalli proposed a 
different procedure [TV08] that sums the local likelihood ratio statistic before forming a CUSUM 
statistics. They also assume the post-change distributions are completely prescribed. Moreover, 
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both procedures assume the change-point is observed by all sensors. When only a subset of sensors 
observe the change-point, these procedures include noise from the unaffected sensors in the detection 
statistic, which may lead to long detection delays. 

In this paper, we develop a mixture procedure that achieves good detection performance in the 
case of an unknown subset of affected sensors and incompletely specified post-change distributions. 
The key feature of the proposed procedure is that it incorporates an assumption about the fraction 
of affected sensors when computing the detection statistic. We assume that the individual obser- 
vations are independent and normally distributed with unit variance, and that the changes occur 
in their mean values. At the tth vector of observations (y n j, n = 1, . . . , N), the mixture procedure 
first computes a generalized likelihood ratio (GLR) statistic for each individual sensor under the 
assumption that the change-point occurs at k < t. Then the local GLR statistics are combined via a 
mixture model that depends on p , the hypothesized fraction of affected sensors. The local statistics 
are then summed and compared with a detection threshold. To characterize the performance of the 
mixture procedure, we derive analytic approximations for its ARL and expected detection delay, 
which are evaluated by comparing the approximations to suitable simulations. Since evaluation of 
the ARL by simulation is quite time consuming, the analytic approximation to the ARL proves very 
useful in determining a suitable detection threshold. The proposed procedure is then compared to 
competing procedures via simulation and is shown to be very competitive in unstructured problems. 
It is also shown to be reasonably robust to the choice of po, and a hierarchical mixture procedure 
is suggested for cases when a single value of po seems inadequate. 

Although we assume throughout that the observations are normally distributed, the model 
can be generalized to an exponential family of distributions satisfying some additional regularity 
conditions. 

The remainder of the paper is organized as follows. In Section 2 we establish our notation and 
formulate the problem more precisely. In Section 3 we review several detections procedures and 
introduce the proposed mixture procedure. In Section 4 we derive the theoretical ARL and expected 
detection delay of the mixture procedure, and demonstrate with numerical examples that these 
approximations are reasonably accurate. In Section 5, we demonstrate that the mixture procedure 
performs well compared to other procedures in the unstructured problem. We also compare the 
mixture procedure to that suggested by [SY08] in a structured problem. Finally Section 7 concludes 
the paper with some discussion. 
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2 Assumptions and Formulation 



Given N sensors, for each n = 1, 2, • • • , N, the observations from the nth sensor are given by y n> t, 
t = 1, 2, • • • . Assume that different observations are mutually independent and normally distributed 
with unit variances. Under the hypothesis of no change, they have zero means. The probability 
and expectation of this case are denoted by P°° and E°°, respectively. Alternatively, there exists 
a change-point k > and a subset M of {1,2,..., N} with cardinality |jV| such that for n G A/", 
the observations y njt at sensors affected by the change-point, have means equal to fi n > for all 
t > n, while observations from the unaffected sensors keep the same standard normal distribution. 
The probability and expectation of this case are denoted by P K and E K , respectively. Note that 
this probability depends on M and the values of /i n , although this dependence is suppressed in the 
notation. The fraction of affected sensors is given by p = \J\f\/N. 

Our goal is to define a stopping rule T such that for all sufficiently large prescribed constants 
c > 0, E°°{T} > c, while asymptotically E K {T — k\T > k} is a minimum. Ideally, the minimization 
would hold uniformly in the various unknown parameters: k, M and the u n . Since this is clearly 
impossible, in Section 5 we will compare different procedures through numerical examples computed 
under various hypothetical conditions. 

3 Detection Procedures 

Since the observations are independent, for an assumed value of the change-point k = k and sensor 
n G Af, the log-likelihood of observations accumulated by time t > k is given by 

t 

£ n (t,k,H n )= (HnVn,i ~ /4/ 2 ) ■ (!) 

i=k+l 

We assume that each sensor is affected by the change with probability po (independently from 
one sensor to the next). The global log likelihood of all N sensors is 

^log{l -p +p exp[e n (t,k,/j, n )]} . (2) 

n=l 

The expression (2) suggests several change-point detection rules. 

One possibility is to set \i n equal to a nominal change, say 5 > 0, which would be important to 
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detect, and define the stopping rule 

T\ = inf < t : max log{l - po + Po exp[£+(t, k, 5)] > b > , (3) 



n=l 



where x + denotes the positive part of x. Here thresholding by the positive part plays the role of 
dimension reduction by limiting the current considerations only to sequences that appear to be 
affected by the change-point. 

Another possibility is to replace \i n by its maximum likelihood estimator, as follows. The 
maximum likelihood estimate of the post-change mean as a function of the current number of 
observations t and putative change-point location k is given by 



t 



fJ>r, 



EM /(*-*)• ( 4 ) 



=fc+i 

Substitution into (1) gives the log generalized likelihood ratio (GLR) statistic. Putting 



t 

(5) 

U n ,k,t — (t — k) l l 2 (S n) t — S n7 k), 



we can write the log GLR as 



e n (t,k,M = (uZ ktt ) 2 /2- (6) 



We define the stopping rule 

T2 = inf \ 1 ' n ^X bg & ~ Po + ^exp[(^ fc , t ) 2 /2]) > b\ . (7) 



N 



0<k<t ■ 

n=l 



Remark. In what follows we use a window limited version of (7), where the maximum is restricted 
to m <t — k < mi for suitable m < mi. The role of m\ is two-fold. On the one hand it reduces 
the memory requirements to implement the stopping rule, and on the other it effectively establishes 
a minimum level of change that we want to detect. For asymptotic theory given below, we assume 
that b — > oo, with m\/b also diverging. More specific guidelines in selecting m\ are discussed below. 
In the numerical examples that follow we take tuq = 1. In practice a slightly larger value can be 
used to provide protection against outliers in the data, although it may delay detection in cases 
involving very large changes. 
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The detection rule (7) is motivated by the suggestion of [ZYS11] for a similar fixed sample 
change-point detection problem. 

For the special case po = 1, (7) becomes the (global) GLR procedure, which for N = 1 was 
studied by [SV95]. It is expected to be efficient if the change-point affects a large fraction of the 
sensors. At the other extreme, if only one or a very small number of sensors is affected by the 
change-point, a reasonable procedure would be 



T max = inf It : max max (U+ k t ) 2 /2 >b\. 

y 0<k<tl<n<N n ' K ' 1 J 



The stopping rule T max (p ) can also be window limited. 

Still other possibilities are suggested by the observation that a function of y of the form log[l 
Po +Po ex P(?/)] large only if y is large, and then this function is approximately equal to [y + \og{po)Y 
This suggests the stopping rules 



T 3 = inf \ t : max Y][£ n (t, k, 5) + \og(p )} + > b \ (9) 



n=l 



and 

N 



T 4 = inf It : max V[(f/^//2 + log(p )] + > b , (10) 



0<fc<f 

n=l 



or a suitably window limited version. 

Mei ([McilO]) suggests the stopping rule 



T Me i = inf Y : y~] max £ n (t, k, 5) > bj , (11) 

which simply adds the classical CUSUM statistics for the different sensors. Note that this procedure 
fails to take advantage of the assumption that all distributions affected by the change-point change 
simultaneously. As we shall see below this failure has a negative impact on the efficiency of the 
procedure, although it might prove beneficial in differently formulated problems. For example, there 
may be a time delay before the signal is perceived at different sensors, or there may be different 
signals occurring at different times in the proximity of different sensors. In these problems, Mei's 
procedure, which allows changes to occur at different times, could be useful. 
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The procedure suggested by Tartakovsky and Veeravalli [TV08] is denned by the stopping rule 



This stopping rule resembles T 3 (p ) with p = 1, but with one important difference. After a change- 
point the statistics of the unaffected sensors have negative drifts that tend to cancel the positive 
drifts from the affected sensors. This can lead to a large expected detection delay. Use of the 
positive part, [£ n (t, k, S)] + , in the definitions of our stopping rules is designed to avoid this problem. 

Different thresholds b are required for each of these detection procedures to meet the ARL 
requirement. 

4 Properties of the Detection Procedures 

In this section we develop theoretical properties of the detection procedures T\ to T 4 , with emphasis 
on Ti. We use two standard performance metrics: (i) the expected value of the stopping time 
T when there is no change, usually called the average run length or ARL and (ii) the expected 
detection delay in the extreme case where a change occurs immediately at K — 0. This provides 
an upper bound on the expected detection delay when a change occurs later in the sequence of 
observations. The approximation to the ARL will be shown below to be very accurate, which is 
fortunate since its simulation can be quite time consuming. 

4.1 Average run length when there is no change 

The ARL is the expected value of the stopping time T when there is no change-point. It will be 
convenient to use the following notation. Let g(x,po) = log(l — po + Po exp[(x + ) 2 /2]), and put 




(12) 



i>(6) =logE{exp[6g(U, Po )]}, 



(13) 



where U has a standard normal distribution. Also let 



7 (0) = ^e 2 E{[mPo)} 2 exp[6g(U,p ) -#?)]}■ 



(14) 
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Here the dot denotes differentiation (with respect to the first argument for a function of more than 
one variable). Let 

mo,Po) = g[ ^ff If ex P - (15) 

Denote the standard normal density function by 4>(x) and its distribution function by <&(x). Also let 
v(x) = 2x~ 2 exp 

[~2J2T n ~ 1( ^ {-\x\n 1/2 /2)] (cf. [Sie85], p. 82). For numerical purposes a simple, 
accurate approximation is given by (cf. [SY07]) 

v ( x) ~ (2/x)[$(x/2)-0.5] 



(x/2)$(x/2) + <j>(x/2)' 



To state our approximation to the ARL of T 2 , we assume that N — > oo and b — > oo with b/N 
fixed. Consider the window limited mixture stopping rule (7) with mi = o(b r ) for some positive 
integer r and define 9 by ip{9) = b/N. Then 



/.[2JV 7 09)/mo] 1/a 

E°°{T 2 }~f(N,9,po)/ / ^ 2 (y)rfy- (16) 

J[2AT^(e)/m 1 l 1 / 2 



-[2A r 7(9)/m ] 1 / 2 

Remark. The integrand in the approximation is integrable at both and oo by virtue of the 
relations u(y) — > 1 as y — > 0, and z/(y) ~ 2/y 2 as y — )■ oo. 

The following calculations provide support for the approximation in (16). For detailed proofs in 
similar problems see [SV95] or [SY08]. From arguments similar to those used by [ZYS11], we can 
show that 



P°°{T 2 < m} 

= P°°{ max y^g(U nAt ;p )>b 

t<m,nio<t—k<mi ' ' 
^ n=l 

/•mi/m 

~ ^e-^^-^Wl^iV^^-^irS^) / v 2 {[2Ny(6) /{mt)] 1 ' 2 ) (1 - t)dt/t 2 . 

J mo/m 



(17) 



Here it is assumed that m is large, but small enough that the right hand side of (17) converges to 
when b — > oo. Assume the maximum window size m\ is small compared to m. Changing variables 
in the integrand and using the notation (15), we can re- write this approximation as 

r [2Nj(e)/m ] !/ 2 

P°°{T 2 < m} ~ m / yAy)dy/f(N,9,Po). (18) 

V[2JV7(0)/mi]V2 

From the arguments in [SV95] or [SY08] (see also [Ald.88]), we see that T 2 is asymptotically ex- 
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ponentially distributed. Hence if A denotes the factor multiplying m on the right hand side of 
(18), then for still larger m, in the range where mX is bounded away from and oo, P°°{T2 < 
m} — [1 — exp(— Am)] — > 0. Consequently, uniform integrability implies that E°°{T 2 } ~ A -1 , which 
is equivalent to (16). 

A simultation illustrating the fact that T 2 is approximately exponentially distributed is given in 
Section 4.3. 

Remarks, (i) The result we have used from [ZYS11] was motivated by a problem involving changes 
that could be positive, or negative, or both and in that paper it was assumed that the function 
g(x,po) is twice continuously differentiable in x. The function g(x,po) defined above for one-sided 
changes, which are more natural in our problem, and have been used in the formulations of earlier 
authors, does not have this much smoothness. We can, however, approximate the indicator of 
x > by <&(rx), as r — > oo, and thus use in place of x + in the definition of g the smooth function 
<&(rv)dv = x<&{rx) + r _1 0(rx), which converges uniformly to x + as r — > oo. Using the cited 
result, then letting r — > oo and interchanging limits produces (17). An alternative approach would 
be simply to define g(x,po) to be appropriate for a one sided change while having the required 
smoothness in x. A simple example is g(x,po) = log[l — po + Po exp(x 2 /2)$(x)]. 

(ii) The fact that the stopping times T studied in this paper are asymptotically exponentially 
distributed can be very useful in simulating the ARL. It is not necessary to simulate the process 
until T, which can be computationally time consuming, but only until we are able to estimate 
P°°{T < m} with a reasonably small percentage error. For the numerical examples given later, we 
have occasionally used this shortcut with the value of m that makes this probability 0.1 or 0.05. 

4.2 Expected Detection Delay 

After a change-point occurs, we are interested in the expected number of additional observations 
required for detection. For the detection rules considered in this paper, the maximum expected 
detection delay over k > is attained at k = 0. Hence we consider this case. 

We continue to use the notation of the preceding section. In particular g(x,p ) = log(l — p + 
Po exp[(a; + ) 2 /2]), and we let U denote a standard normal random variable. Recall that M denotes 
the set of sensors at which there is a change, M = \J\f\ is the cardinality of this set, and p = \J\f\/N is 
the true fraction of sensors that are affected by the change. For each n 6 M the mean value changes 
from to fi n > 0, and for n G M c the distribution remains the same as before the change-point. 
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Let i 2 

\n£Af J 

Note that the Kullback-Leibler divergence of a vector of observations after the change-point 
from a vector of observations before the change-point is A 2 /2, which determines the asymptotic 
rate of growth of the detection statistic after the change-point. Using Wald's identity [Sie85], we 
see to a first order approximation that the expected detection delay is 2b/ A 2 , provided that the 
maximum window size, mi, is large compared to this quantity In the following derivation we 
assume mi ^> 2b/ A 2 . 

In addition, let 

t 



& = ( 2 °) 



i=i 



be a random walk where the increments z% are independent and identically distributed with mean 
A 2 / 2 and variance A 2 . Let r = min |f : S t > Oj. Our approximation to the expected detection 
delay given below depends on two related quantities. The first is 

p(A) = iE{S T 2 }/E{£ r }, (21) 

for which exact computational expressions and useful approximations are available in [Sie85]. In 
particular 

oo oo 

p(A) = E{z 2 1 }/(2E{z 1 }) - jV 1 *^} = A 2 /4 + 1 - JV^Sr}, (22) 

i=l i=l 

where (x)~ = — min{a;, 0}. The second quantity is E{min t > S t }, which according to (Problem 8.14 
in [Sic85]) is given by 

E|min^}=p(A)-l-A 2 /4. (23) 

The following approximation refines the first order result for the expected detection delay. As 
b — > oo, with other parameters held fixed, 



E°{T 2 } = 2A~ 



b + p(A) - \M\ logp - \Af\/2 + E min SA - (N - \Af\)E{g(U, Po )} + o(l 



(24) 

The following calculation provides the ingredients for a proof of (24). For details in similar 
problems involving a single sequence, see [PS75] and [SV95]. For convenience we write T = T 2 . Let 
ko = b 1 / 2 . For k < T — k , we can write the detection statistic at the stopping time T as follows, 



10 



up to a term that tends to zero exponentially fast in probability: 



A' 



z k ,t = ^2g(Un )k , T ,Po) 



n=l 



J>g (p exp{(^ T ) 2 /2} 



i+^expi-^vm 

Po 



J + 9(U n<k , T ,Po) 



= £ HPo + «,,t) 2 /2] + £ g(U nAT , Po ) + £ log (l + ^exp {-{Ul KT f/2]\ 

= \M\ logpo + E«^t) 2 /2 + £ 9(VnMT>Po) + 

= |AT|logp + E[(^-^) + ] 2 /2(T-A;)+ £ s(fW, Po ) + o(l). 

nGAf nGA/" c 

(25) 

The residual term YlneAf 1°§ (l + (1 ~~ Po) exp { — (17+ fc T ) 2 /2} /po) tends to zero exponentially fast 
when b — > oo, because when b — > oo, T — >■ 6/A, and n £ A/", {U^ kT ) 2 grows on the order of 
j4(T - fc) > ^fc = £ Vb. 

We then use the following simple identity to decompose the second term in (25) for the affected 
sensors into two parts: 



(S+ ) 2 /2* = Sy 2f - (S- t ) 2 /2t 

= »n{Sn,t ~ lint/2) + (S n , t ~ /2t - {S~ t ) 2 /2t. 



(26) 



From the preceding discussion, we see that max < fc<r „ fco T is on the order of b, while max T _ fco < fc<r Z k ^ T 
is on the order of ko = b 1 ^ 2 . Hence with overwhelming probability the max over all k is attained for 
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k < T — k , so from (26) and (25) we have 



max Z kt 

0<k<T 



N 



0< ^ k X 9{Un ' k ^ po)+o{1] 

n=l 



-\J\f\ logon + m ax 

0<k<T-k 



Vn[(S n ,T - S n>k ) — (T — k)n n /2\ + ^T[(S„,t - S n>k ) - (T - fc)//»]7[2(T 



neAf 



-[(S n ,T - S n>k )'] 2 /2(T -k)+J2 9(Un,k,T,Po) 



neAf c 



+ o(l) 



--\J\f\ logpo + ^ A*n ('S'n.T - T fin/2) + 



neAf 



max 

0<fc<T-fc 



- J] ^ (<$»,* " W2) + ^ [(S n> T - S n;k ) — (T — fc) Mn ] 2 /[2(T - fc)] 

neA/" nGA/" 



-^K S ^T-Sn,kr] 2 /[2(T-k)]+ £ 9(Un,k,TiPo) 
n£Af n£Af c 



+ 0(1). 



(27) 



The following lemma forms the basis for the rest of the derivation. The proof is omitted here; 
for details see ([Xiell]) or ([SV95]) for the special case N — 1. 

Lemma 1. For k = b 1 ^ 2 , asymptotically as b — > oo 



max 

0<fc<T 



y, /in (s n ,k - k fi n /2) + y 



[{S n ,T - S njk ) - (T - k)fj, n ] 



neAf 



n&Af 



2(T - k) 



neJV 



[(Sn,T — S, 



n,k ) 



2(T - k) 



+ Y d(U n ,k,T,Po) 



n£Af c 



y^(S n ,T -Tfi n ) 2 /2T + y^g(U nfitT ,p Q )+ max 

' J ' ' 0<fc<fco 



neAf 



i£Af c 



^ fin(S n ,k - kfin/2) 
n&Af 



+ o p (l), 



where o p (l) converges to in probability. 
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By taking expectations in (27), letting b — > oo and using Lemma 1, we have 
E° < max y~]g(U ntkt T,Po) 

0<k<T ^— ' 
I " n=l 

=E° J \U\ logpo + ^{S n ,T - T ^/ 2 ) + E (gw,T ^ r r/Xw)2 + E 0(*W,Pb)+ (28) 

I nGAf neA/" re£Af c 



max 

0<fc<fc 



neAf 



oil). 



We will compute each term on the right hand side of (28) separately. We will need the lemma 
due to Anscombe and Doeblin (see Theorem 2.40 in [Sie85]), which states that the standardized 
randomly stopped sum of random variables are asymptotically normally distributed under quite 
general conditions. 

(i) By Wald's identity [Sic85]: 

(29) 



E° 1 »n(S n ,T ~ T/i„/2) 1 = E°{T}A 2 /2. 

IneAf J 



(ii) By the Anscombe-Doeblin Lemma, (S n> T — T^i^/T 1 ^ 2 is asymptotically normally distributed 
with zero mean and unit variance. Hence ^2 n€ j^{S n ,T — Tfi n ) 2 /T is asymptotically a sum of 
independent \\ random variables, so 



E ° { E^ - T ^f/2T 1 = \M\/2 + o(l). 



(30) 



(hi) Similarly, 



E° { 9(U n> o,T,Po) \^(N- \Af\)E°{g(U,p )}. (31) 



(iv) The term — YlneAf Mn(»Sn,fc — /U re fc/2) (k > 0) is a random walk with negative drift — A 2 /2 and 
variance A 2 . Hence E° {max <fc<fc [— J2 n eX ^(Sn^ — fc/i n /2)]} converges to the expected 
minimum of this random walk, which has the same distribution as min 4 >o St defined above. 

Having evaluated the right hand of (28), we now consider the left-hand side. The first order 
asymptotic behavior of the process ^ n =i 9(U n ,k,T, Po) is the same as that of J2 n &Af ^n(S n ,T—TfJ, n /2), 
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which has drift A 2 /2 and variance A 2 . By writing 



N 1 ( N 



E° I max YVtW^o) }=b + E°l max Y^g(U n ^ TjPo ) -b\, (32) 



0<k<T 0<fc<T- 

n=l ) \ n=l 



and using nonlinear renewal theory to evaluate the expected overshoot of the process of (20) over 
the boundary ([Sic85], Chapter IX), we obtain 



E° \ max J2g(U n>KT ,p ) - b \ -> p(A). (33) 



n=l 



Remark. An examination of the preceding argument indicates that the accuracy in practice of 
the approximation may be quite variable. In particular, variability in the \i n will be reflected in 
variability in the accuracy of the expansion of g{U n ,k,ti Po)- m particular if some of the \x n are close to 
0, there will not be the clear separation required by the derivation of the indices n into terms where 
the approximation applies and terms where it does not, with the result that the approximation may 
break down. 



4.3 Accuracy of the Approximations 

We start with examining the accuracy of our approximations for the ARL and the expected detection 
delay in (16) and (24). For a Monte Carlo experiment we use = 100 sensors, mi = 200 and \i n = 1 
for all affected data streams. The comparisons between the theoretical and Monte Carlo ARLs for 
different values of po are given in Table 1. Numerical results obtained from 500 Monte Carlo trials 
are given in Table 1, where they show that the approximation in (16) is quite accurate. 

Table 1: Average run length (ARL) of T 2 (po)? m i = 200. 



Po 


b 


Theory 


Monte Carlo 


0.3 


31.2 


5001 


5504 


0.3 


32.3 


10002 


10221 


0.1 


19.5 


5000 


4968 


0.1 


20.4 


10001 


10093 


0.03 


12.7 


5001 


4830 


0.03 


13.5 


10001 


9948 



Figure 1 illustrates the fact that (for example) T 2 (0.1) is approximately exponentially distributed. 
Results for the expected detection delay obtained from 500 Monte Carlo trialsare given in Table 
2. The approximation for the expected detection delay does not appear to be as accurate as the 
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2000 4000 6000 8000 10000 

m 

Figure 1: The tail probability P{T 2 (0.1) > m}. Theoretic values are calculated from (16). 

approximation for the ARL. Since the expected detection delay requires considerably less compu- 
tational effort to simulate and needs to be known only roughly when we choose design parameters 
for a particular problem, we are less concerned about its accuracy. 

Table 2: Expected detection delay of the mixture procedure with ARL « 5000, \i — 1, and m\ = 200. 



p 


Po 


b 


Theory 


Monte Carlo 


0.3 


0.3 


31.2 


3.5 


3.2 


0.1 


0.3 


31.2 


6.2 


6.5 


0.3 


0.1 


19.5 


5.2 


3.6 


0.1 


0.1 


19.5 


7.2 


6.7 


0.03 


0.1 


19.5 


13.9 


14.3 


0.03 


0.03 


12.7 


13.9 


14.2 



We have performed considerably more extensive simulations that yield results consistent with 
the small experiments reported in Tables 1 and 2. Since the parameter po defining T 2 must be 
chosen subjectively, it is interesting to observe that Table 2 suggests this procedure is reasonably 
robust with respect to the choice of p$, and choosing p somewhat too large is less costly than 
choosing po too small. More extensive calculations bear out this observation. We return to the 
problem of choosing po in Section 5.1. 



15 



5 Numerical Comparisons 



In this section, we compare the expected detection delays for several procedures when their ARLs 
are all approximately 5000. The thresholds are given in Table 3, where we assume N = 100, and 
mi = 200 for those procedures for which a limited window size is appropriate. The procedure 
(7) is denoted by T 2 (po). For Mei's procedure we put 5 n = 1. The procedures (9) are denoted by 
T 3 (p , S). Recall that T 2 (l) uses the generalized likelihood ratio statistic and T 3 (l, 5) is similar to the 
procedure proposed by Tartakovsky and Veeravalli (2008), but we have inserted the positive part to 
avoid the problems discussed above with this procedure when a relatively small number of sensors 
are affected by the changepoint. The expected detection delays are obtained from 500 Monte Carlo 
trials and are listed in Table 4. For some entries values from our asymptotic approximation are 
given in parentheses. 

Note that the max procedure (8) has the smallest detection delay when p = 0.01, but it has the 
largest delay for p greater than 0.1. The procedures defined by T 2 and by T 3 are comparable. Mei's 
procedure performs well when p is large, but poorly when p is small. 

Table 3: Thresholds for ARL « 5000, m 1 = 200. 



Procedure 


b 


Monte Carlo ARL 


Max 


12.8 


5041 


Ui) 


53.5 


4978 


T 2 (0.1) 


19.5 


5000 


Mei 


88.5 


4997 


T 3 (.l,l) 


12.4 


4948 


7KM) 


41.6 


4993 



16 



Table 4: Expected Detection Delays with N = 100 obtained from 500 Monte Carlo trials. Thresh- 
olds for ARL 5000 are listed in Table 3. Theoretical approximations for the expected detection 
delay are in parentheses. 



p 


1\ T^4-T ^ J 

Method 


TATA , , 1 

UU, fl — i 


TATA , , n 7 

DD, fl — [).( 


TATA , , IO 
JJD, /X = 1.0 


n m 


max 


£ 

zo.o 


Add 


1 CI O 
iD.O 




T 2 (l) 


52.3(56.9) 


105.5(114.6) 


32.9(34.1) 




rri / 1 \ 

T 2(-l) 


31.6(32.5) 


59.4(64.9) 


on / 1 n 7\ 

20.3 (19.7) 




A T , s ^ 

Mei 


ro n 
OO. Z 


1 no 


08.1 




T 3 (.l,l) 


29.1 


63.3 


19.1 




33(1, lj 


00 n 
oZ.U 


OI O 7 

zlo. f 


ro 
OO.O 


n no 

(J. (Jo 


max 


18.1 


OO 

33.3 


11.0 




^ 2 (1) 


18.7(19.3) 


35.8(38.4) 


12.6(11.7) 




r 2 (.l) 


"1 A O / 1 O /^i \ 

14.2(13.9) 


26.7(27.5) 


9.3(8.5) 




Mei 


oo n 

23.0 


41.6 


16.4 






13.4 


26.9 


9.2 






07 O 

z / .z 


bb.U 


lo.o 


n ntx 


max 


1 E E 
10. 


OO A 
ZO.4 


y. 1 




^ 2 (l) 


12.2(11.6) 


21.8(23.0) 


7.9(7.1) 




rn / i \ 

^(•1) 


10.4(10.1) 


18.9(19.9) 


6.9(6.2) 




A T , , ^ 

Mei 


lo. ( 


ZO.y 


11.4 




T 3 (.l,l) 


9.8 


18.6 


7.0 




T 1 1 1 "\ 
M 1 ; 1) 


1 r p 
10.0 


OO O 
OO.O 


n n 

y.u 


n 1 

0.1 


max 


lz.6 


00 n 

23.0 


O A 

8.4 




m / 1 \ 


6.7(5.9) 


11.8(11.3) 


4.7(3.7) 




T2U) 


6.7(7.2) 


11.6(14.1) 


4.6(4.5) 




A T , s ^ 

Mei 




15.4 


7 /I 

f .4 




m l 1 1 \ 

r 3 (.l,l) 


7.1 


11.9 


5.3 




^3(1,1) 


6.8 


1 7 

15.7 


4.6 


n o 

0.3 


max 


9.6 


1 C* 7 

16.7 


6.6 




r 2 (l) 


3.0(2.0) 


4.4(3.5) 


2.4(1.4) 




m / -1 \ 


3.5(5.2) 


5.6(10.1) 


2.7(3.3) 




A T , , ^ 

Mei 


a n 
4.9 


7 n 
r .U 


a n 
4.U 




m / 1 1 \ 

r 3 (.l,l) 


A 

4.6 


6.7 


3.9 




T 1 ( 1 1 \ 


n 

3.0 


/i 

4.3 


O C 

2.5 


0.5 


max 


8.6 


1 A A 

14.4 


5.8 




m / -1 \ 


2.3 


3.0 


2.0 




T2U) 


2.8 


4.0 


2.1 






O.8 


o.U 


n 
o.U 




T 3 (.l,l) 


4.0 


5.4 


3.3 




r 3 (i,i) 


2.3 


3.0 


2.0 


1 


max 


7.2 


12.1 


5.1 






2.0 


2.0 


2.0 




r 2 (.i) 


2.0 


2.6 


2.0 




Mei 


3.0 


3.4 


2.3 




T 3 (.l,l) 


3.4 


4.3 


3.0 




r s (i,i) 


2.0 


2.1 


2.0 
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5.1 Parallel Mixture Procedure 



The procedures considered above depend on a parameter p 0l which presumably should be chosen 
to be close to the unknown true fraction p. While Table 4 suggests that the value of po is fairly 
robust, to achieve robustness over a wider range of the unknown parameter p, we consider a parallel 
procedure that combines several procedures, each using a different po to monitor a different range 
of p values. The thresholds of these individual procedures will be chosen so that they have the 
same ARL. For example, we can use two different values of po, say a small p$ = p\ and a large 
Po — P2-, and then choose thresholds b\ and 6 2 to obtain the same ARL for these two procedures. 
The parallel procedure claims a detection if at least one of the component procedures reaches its 
threshold, specifically 

^parallel = Hiin{T 2 (pi ) , T 2 (p 2 ) } . (34) 

To compare the performance of the parallel procedure with that of a single T 2 , we consider a 
case with N = 400 and mi = 200. For the single mixture procedure we use the intermediate value 
po = 0.10 and threshold value b = 44.7, so P°°{T 2 < 1000} « 0.10 and hence the ARL « 10000. 
For the parallel procedure we consider the values p\ = 0.02 and p 2 = 0.33. For the threshold values 
6i = 21.2 and b 2 = 87.7, respectively, we have P°°{T 2 (pi) < 1000} « 0.05. By the Bonferroni 
inequality P°°{min[T 2 (pi), T 2 (p 2 )} < 1000} < 0.1, so conservatively E°°{Tp arallc i} > 10000. Table 5 
shows that the expected detection delays of the parallel procedure are usually smaller than those of 
the single procedure, particularly for very small or very large p. Presumably these differences are 
magnified in "larger" problems. 

Table 5: Comparison of Parallel and Simple Procedures 



Procedure 


P 




Expected Detection Delay 


r 2 (.i) 


0.1 


0.7 


6.5 




0.005 


1.0 


27.1 




0.005 


0.7 


54.5 




0.25 


0.3 


12.0 


Parallel Procedure 


0.1 


0.7 


6.4 




0.005 


1.0 


22.9 




0.005 


0.7 


45.8 




0.25 


0.3 


10.5 



Simulations indicate that because of dependence between the two statistics used to define the 
parallel procedure, the ARL is actually somewhat larger than the Bonferroni approximation sug- 
gested. Since the parallel procedure becomes increasingly attractive in larger problems, which 
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provide more room for improvement over a single choice of po, but which are also increasingly dif- 
ficult to simulate, it would be interesting to develop a more accurate theoretical approximation for 
the ARL. 

An attractive alternative to the parallel procedure would be to use a weighted linear combination 
for different values of po of the statistics used to define T2 or T3. Our approximation for the ARL 
can be easily adapted, but some modest numerical exploration suggests that the expected detection 
delay is not improved as much as for the parallel procedure. 

6 Profile-based Procedure for Structured Problems 

Up to now we have assumed there is no spatial structure relating the change-point amplitudes 
at difference sensors. In this section we will consider briefly a structured problem, where there is a 
parameterized profile of the amplitude of the signal seen at each sensor that is based on the distance 
of the sensor to the source of the signal. Assuming we have some knowledge about such a profile, 
we can incorporate this knowledge into the definition of an appropriate detection statistic. Our 
developments follow closely the analysis in ([SY08]). 

Assume the location of the nth sensor given by its coordinates x n , n = 1, • • • , N at points in 
Euclidean space, which for simplicity we take to be on an equi-spaced grid. We assume that the 
source is located in a region V, which is a subset of the ambient Euclidean space. In our example 
below we consider two dimensional space, but three dimensions would also be quite reasonable. 
Assume the change-point amplitude at the nth sensor is determined by the expression 

M 

= r m a Zm (x n ), (35) 

m=l 

where M is the number of sources, z m £ D is the (unknown) spatial location of the mth source, 
a z (x) is the profile function, and the scalar r m is an unknown parameter that measures the strength 
of the mth signal. The profile function describes how the signal strength of the mth point source 
has decayed at the nth sensor. We assume some knowledge about this profile function is available. 
For example, a z (x) is often taken to be a decreasing function of the Euclidean distance between z 
and x. The profile may also depend on finitely many parameters, such as the rate of decay of the 
function. See [Rab94] or [SSSW03] for examples in a fixed sample context. 

If the parameters r m are multiplied by a positive constant and the profile ct Zm (x n ) divided by the 
same constant, the values of \x n do not change. To avoid this lack of identifiability, it is convenient 
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to assume that for all z the profiles have been standardized to have unit Euclidean norm, i.e., 



6.1 Profile-based procedure 



Under the assumption that there is at most one source, say at z, for observations up to time t with 
a change-point assumed to equal k, the log likelihood function for observations from all sensors (1) 
is 

N 

£(t, k, r, z) = ^2[ra z (x n )(S n>t - S n>k ) -r 2 (t- k)a 2 z (x n ) /2}. (36) 



71=1 



When maximized with respect to r this becomes 



(37) 



Maximizing the function (37) with respect to the putative change-point k and the source location 
z, we obtain the log GLR statistic and a profile-based stopping rule of the form 



^profile = m f { t : max max 

t—mi<k<t—rno zdT> 



> b 



(3* 



If the model is correct, (38) is a matched-filter type of statistic. 



6.2 Theoretical ARL of profile-based procedure 

Using the result presented in [SY08], we can derive an approximation for the ARL of the profile- 
based procedure. We consider in detail a special case where d = 2 and the profile is given by a 
Gaussian function 

aJx) = — = e -^ llx - zll \ x G M 2 , P > 0. (39) 

The parameter (3 > controls of rate of profile decay and is assumed known. With minor modifi- 
cations one could also maximize with respect to a range of values of /3. 

Although the sensors have been assumed to be located on the integer lattice of two-dimensional 
Euclidean space, it will be convenient as a very rough approximation to assume that summation 
over sensor locations x can be approximated by integration over the entire Euclidean space. With 
this approximation, J2 x a ^( x )y which we have assumed equals 1 for all z, becomes L 2 al(x)dx, 
which by (39) is readily seen to be identically 1. The approximation is reasonable if /3 is large, so 
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the effective distance between points of the grid is small, and the space T>, assumed to contain the 
signal, is well within the set of sensor locations (so edge effects can be ignored and the integration 
extended over all of IR 2 ). 

It will be convenient to use the notation 

(f,g)= [ f(x)g(x)dx. (40) 

Let a z denote the gradient of a z with respect to z. Then according to [SY08], 

'(b/ mi y 

To evaluate the last integral in (41), we see from (39) that a z satisfies 



r (b/ mo y/ 2 r 

P°°{T prome < m} ~ mexp(-6/2)(6/47r) 3/2 2 1/2 / uv 2 {u)du / [det| (a z , dj) \ 1/2 dz, (41) 

i(Vmi) 1 / 2 JV 



a z (x) = a z (x)(x-z)/(2/3). (42) 

Hence by (40) (a z , dj) is a 2 x 2 matrix of integrals, which can be easily evaluated, and its determi- 
nant equals l/(16/3 4 ). Hence the last integral in (41) equals |P|/(4/3 2 ) where \T>\ denotes the area 
of T>. Arguing as above from the asymptotic exponentiality of T pro fii e , we find that an asymptotic 
approximation for the average run length is given by 



b N " 3/2 

E°°{T profilc } ~ exp(6/2) 



Itt 



(b/mo) 1 / 2 

uv 2 {u)du-\V\/{2 1/2 (3 2 ) 

(6/mi)V2 



(43) 



6.3 Numerical examples 



In this section we briefly compare the unstructured detection procedure based on T 2 with the profile 
procedure in the special case that the assumed profile is correct. 

Assume that the profile is given by the Gaussian function (39) with parameter (3 = 1 and both 
procedures are window-truncated with m = 1, m\ = 100. The number of sensors is N = 625 
distributed over a 25 x 25 square grid with center at the origin. In this situation, approximately 
p = 0.016 sensors are affected. In the specification of T 2 we take po = 0.05. 

The thresholds are chosen so that the average run lengths when there is no changepoint are 
approximately 5000. Using (41), we obtain P tX3 {T profi i e < 250} = 0.050 for b = 29.5. From 500 
Monte Carlo trials we obtained the threshold 26.3, so the theoretical approximation appears to be 
slightly conservative. 
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To deal with a failure to know the true rate of decay of the signal with distance, we could 
maximize over /3, say for /3 G [0.5,5]. A suitable version of (41) indicates the threshold would be 
33.8. This slight increase to the threshold suggests that failure to know the appropriate rate of 
decay of the signal with distance leads to a relatively moderate loss of detection efficiency 

For comparisons of the expected detection delay, we used for the profile procedure the threshold 
26.3, given by simulation, while for T 2 (0.05) we used the analytic approximation, which our studies 
have shown to be very accurate. Table 6 compares the expected detection delay of the profile based 
procedure with that of the mixture procedure. As one would expect from the precise modeling 
assumptions, the profile procedure is substantially more powerful. 

In many cases there will be little scientific basis for the assumed profile, so it would also be 
interesting to compare the structured and the unstructured problems when the assumed profile 
differs moderately or substantially from the true profile, perhaps in the number of sources of the 
signals, their shape, the rate of decay, or the locations of the sensors. 



Table 6: Comparison of Expected Detection Delays (EDD) 





b 


EDD r = 1 


EDD, r = 1.5 


Profile-based procedure 
Unstructured procedure 


26.3 
39.7 


25.65±10.3 
78.26±30.9 


12.33±4.4 
35.79±11.1 
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7 Discussion 



For an unstructured multi-sensor change-point problem we have suggested and compared a number 
of sequential detection procedures. We assume that the pre- and post- change samples are normal 
distributed with known variance and that both the post-change mean and the set of affected sensors 
are unknown. For performance analysis, we have derived approximations for the average run length 
(ARL) and the expected detection delay, and have shown that these approximations have reasonable 
accuracy. Our principal procedure depends on an assumed fraction of sensors affected by the change- 
point. We show numerically that the procedures are fairly robust with respect to discrepancies 
between the actual and the hypothesized fractions, and we suggest a parallel procedure based on 
two or more hypothesized fractions to increase this robustness. 

In a structured problem, we have shown that knowledge of the correct structure can be imple- 
mented to achieve large improvements in the expected detection delay. An open question is the 
extent to which failure to hypothesize the appropriate structure compromises these improvements. 
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